DOCUMENT RESUME 



ED 038 313 



SE 008 276 



AUTHOR 

TITLE 

INSTITUTION 
REPORT NO 
PUB DATE 
NOTE 



McCormick, William T. , Jr.; And Others 
Identification of Data Structures and Relationships 
by Matrix Reordering Techniques. 

Institute for Defense Analysis, Arlington, Va. 

RP-312 

Dec 69 

141p. 



EDRS PRICE EDRS Price MF-$0„75 HC-$7.15 

DESCRIPTORS ’^Algorithms, *Data Analysis, Data Processing, 

Mathematical Applications, Mathematical Models, 
♦Mathematics, ^Research Methodology, ^Statistical 
Analysis 



AESTRACT 



Presented are the results of a study conducted to 
develop algorithms for ordering and organizing data that can be 
presented in a two-dimensional matrix form. The purpose of the work 
was to develop methods to extract latent data patterns, grouping, and 
structural relationships which are not apparent from the raw matrix 
data. The algorithms developed are (1) Moment Ordering Algorithm, (2) 
Moment Compression Algorithm, and (3) Bond Energy Algorithm. They are 
applicable to a variety of problems involving multivariate data 
analysis. (RS) 



S' & 0 o? 3^76 







r— 1 




K\ 


U.S. DEPARTMENT OF HEALIH, EDUCATION & WELFARE 


CO 


OFFICE OF EDUCATION 


K\ 




O 


THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS 


a 


STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 


fjj 


POSITION OR POLICY. 






C5(L 



£ £ 






RESEARCH PAPER P-512 



IDENTIFICATION OF DATA STRUCTURES 
AND RELATIONSHIPS BY MATRIX REORDERING TECHNIQUES 



William T. McCormick, Jr. 
Stephen B. Deutsch 
John J. Martin 
Paul J. Schweitzer 



December 1969 



!D A 




INSTITUTE FOR DEFElf 
SYSTEMS EVALUATION, 



INSTITUTE FOR DEFENSE ANALYSES 
400 ARMY-NAVY DRIVE 
ARLINGTON, VIRGINIA 22202 






IDA Log No. HQ 69-10829| 
Copy of 150 copies 



L 



ERIC 



The information contained in this publication was developed under the 
IDA's independent research program. Its publication by IDA does not 
imply endorsement by the Department of Defense or other Government 
agency nor should the contents be construed as reflecting the official 
position of any U.S. Government office. 



3726 - 0152 - 3/70 



RESEARCH PAPER P-512 



IDENTIFICATION OF DATA STRUCTURES 
AND RELATIONSHIPS BY MATRIX REORDERING TECHNIQUES 



William T. McCormick, Jr. 
Stephen B. Deutsch 
John J. Martin 
Paul J. Schweitzer 



December 1969 




I DA 



INSTITUTE FOR DEFENSE ANALYSES 
SYSTEMS EVALUATION DIVISION 
400 Army-Navy Drive, Arlington, Virginia 22202 



FOREWORD 



This paper presents the results of a study undertaken 
to develop methods for ordering and organizing technical, 
social, economic and other data that can be presented in array 
form. The study leading to the development of this report 
was conducted as independent research at the Institute for 
Defense Analyses. The theory and development of the 
algorithms described in this paper are the work of members of 
the Systems Evaluation Division. 



ABSTRACT 



This research paper presents the results of a study conducted to develop algorithms for 
ordering and organizing data that can be presented in a two-dimensional matrix form. The only 
restriction imposed on the analysis was that the rows and columns of the raw input data 
matrices could only be reordered, thus preventing the creation of artificial coefficients or loss 
of essential input information. The purpose of this work was to develop methods to extract 
latent data patterns, groupings, and structural relationships which are not, in general, apparent 
from the raw matrix data. 

Three distinct algorithms were developed and are presented in detail within the report. 
They have been applied to a variety of examples from the social and technical sciences which 
will also be discussed. The first method developed, the Moment Ordering Algorithm, has 
proven to be an effective technique for uncovering and displaying a dominant univariate 
relationship between the two sets of entities that lie along the vertical and horizontal axes of a 
matrix. The second method, the Moment Compression Algorithm, is designed to factor 
decomposible matricf by proper reordering but was not applied extensively because of its 
complex and time-consuming solution. The last method developed, the Bond Energy 
Algorithm, was found to be applicable to a broader class of problems than the first two 
methods and is able to efficiently organize, group, and interrelate data of considerably more 
complex structure. 

It will be shown that the techniques developed in this work are applicable to a variety 
of problems involving multivariate data analysis and, when used, can often significantly 
augment the level of understanding and comprehension of complicated multivariate 
relationships. 
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I. INTRODUCTION 



Since the introduction of the large digital computers, methods of multivariate analysis 1 
are being developed that utilize more effectively the computational resources and character- 
istics of the computer than some of the more conventional and established statistical tech- 
niques. These new methods are being employed because it is now possible to undertake data 
analysis problems in considerably greater detail than was previously feasible. A class of 
techniques that is able to account for detailed individual relationships as well as macroscopic 
data structure is exemplified by the cluster-seeking 2 methods. Ball (Ref. 1) has accurately 
pointed out that many classical statistical techniques depend heavily on statistical quantities 
estimated from the data and that this “averaging” from the data can sometimes lead to 
erroneous conclusions. This is simply because microscopic variations in the data cannot, in 
general, be detected from the statistical quantities estimated with the result that small but 
significant information can be overwhelmed and even lost under the pressure of larger 
statistical trends. Furthermore, many of these classical techniques such as principal component 
analysis (Ref. 28) or factor analysis (Ref. 28) implicitly assume data distributions that are not 
always present. Thus, it appears that there is a definite need for better direct analysis 
techniques so that it is not necessary to completely rely on functions of data or on 
assumptions regarding their distribution. 

This paper presents three new direct data analysis techniques that were developed at 
the Institute for Defense Analyses. One of the algorithms, the Bond Energy Algorithm , shares 
a few of the same objectives as some of the other cluster-seeking techniques (Refs. 2 through 
20) but has several important differences and advantages. The Moment Ordering Algorithm has 
as its principal goal the discovery of a single dominant relationship in the data, while the 
Moment Compression Algorithm attempts to factor the data into separable pieces or clusters. 
Two important characteristics that all three of these methods share is that they operate 
directly on the non-negative raw input matrix data and that they reorganize and reorder the 
matrix data by performing row and column permutations in order to reveal obscure and 



1. Multivariate Analysis includes such mathematical techniques as Regression Analysis, Factor Analysis, Principal 

Component Analysis, Canonical Analysis, Ouster Analysis, etc. 

2. Ouster Seeking techniques are those data analysis methods which seek to identify groups of similar entities. 
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potentially informative data patterns. The output of all these algorithms, then, is a new data 
matrix with its resulting new ordering. 

In Chapter II, the most important features and characteristics of each of the three 
algorithms will be briefly described. Then, in Chapter III, the major results and conclusions of 
this study will be presented. Part II of this paper contains a detailed description of the theory 
and development of the three algorithms along with a number of pertinent examples which 
illustrate the favorable characteristics and general applicability of these methods. 



